Towards Comprehensive Web Search

نویسنده

  • Erik Warren Selberg
چکیده

Towards Comprehensive Web Search by Erik Warren Selberg Chair of Supervisory Committee Associate Professor Oren Etzioni Department of Computer Science & Engineering The World Wide Web has rapidly become a key medium for information dissemination to all members of society. However, its disorganized nature and sheer size can make it difficult for people to find information. Web search services have made a significant contribution towards enabling people to quickly find information on the Web. Unfortunately, as of this writing, no Web search service can conduct a comprehensive search of the Web for any topic. In addition, many major Web search services are unable to return a stable set of results. An intuitive assumption about the behavior of any search service is that the results of a given query will be unchanged unless either the documents referred to in the results change and become irrelevant or better documents become available. Unfortunately, due to a variety of real-world constraints and design choices, many search services are unstable, intermittently omitting relevant documents from search results even though the documents are contained within their indices. One technique that could enable both a more comprehensive search of the Web as well as a more stable search is meta-search. Meta-search is conducting a single search using multiple search resources. This thesis examines the application of meta-search to the World Wide Web. It answers the following questions: can meta-search provide a more comprehensive search than traditional Web searching? Is meta-searching necessary now, and will it be necessary in the future? Can meta-searching be implemented in a practical manner? And can meta-search enable a more stable search? We present MetaCrawler, a meta-search service, as a means of obtaining a more comprehensive search than existing Web search services. MetaCrawler addresses some of the issues with Web search service through forwarding a user query and combining the results from multiple search services into a single list. To summarize the results of this thesis, we conclude that MetaCrawler demonstrates that meta-search can be implemented in a manner such that average Web users will take advantage of the benefits of meta-search. We also conclude that MetaCrawler demonstrates that meta-search can be provided and maintained with limited resources. Through our experiments using Inference of User Value through Real-world data, a new methodology to evaluate search services, we conclude that MetaCrawler provides a significantly more comprehensive search than any single search service. Furthermore, the growth trends of the Web and search service indices lead us to conclude that meta-search will be necessary to provide a comprehensive search in the future. Our experiments also demonstrate that most major search services are unstable, and may omit relevant documents even if those documents are present in their indices. Finally, we conclude that search results can not only be made stable, but can be improved through Collaborative Index Enhancement, a novel model for enhancing a searchable index based on the experience of previous users.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Scalability Challenges in Web Search Engines

Continuous growth of the Web and user bases forces web search engine companies to make costly investments on very large compute infrastructures. The scalability of these infrastructures require careful performance optimizations in every major component of the search engine. Herein, we try to provide a fairly comprehensive coverage of literature on scalability challenges in large-scale web searc...

متن کامل

A Comprehensive Review of Image Retrieval Based On Example Video Clip

In the recent years, with the usage of internet, there has been large amount of data resides on the web. Everyone is interested for accurate and fast retrieval search engines that retrieve images. This paper tries to present a comprehensive review and differentiate the various problems of image retrial techniques. This paper presents a survey of the most popular image retrieval techniques with ...

متن کامل

Towards a Model of Information Scatter: Implications for Search and Design

Recent studies suggest that users often retrieve incomplete healthcare information because of the complex and skewed distribution of facts across relevant webpages. To understand the causes for such skewed distributions, this paper presents the results of two analyses: (1) A distribution analysis discusses how facts related to healthcare topics are scattered across high-quality healthcare pages...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999